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ABSTRACT 

Given a set of people and a set of events they attend, we ad- 
dress the problem of measuring connectedness or tie strength 
between each pair of persons given that attendance at mu- 
tual events gives an implicit social network between people. 
We take an axiomatic approach to this problem. Starting 
from a list of axioms that a measure of tie strength must 
satisfy, we characterize functions that satisfy all the axioms 
and show that there is a range of measures that satisfy this 
characterization. A measure of tie strength induces a rank- 
ing on the edges (and on the set of neighbors for every per- 
son). We show that for applications where the ranking, and 
not the absolute value of the tie strength, is the important 
thing about the measure, the axioms are equivalent to a nat- 
ural partial order. Also, to settle on a particular measure, 
we must make a non-obvious decision about extending this 
partial order to a total order, and that this decision is best 
left to particular applications. We classify measures found 
in prior literature according to the axioms that they sat- 
isfy. In our experiments, we measure tie strength and the 
coverage of our axioms in several datasets. Also, for each 
dataset, we bound the maximum Kendall's Tau divergence 
(which measures the number of pairwise disagreements be- 
tween two lists) between all measures that satisfy the axioms 
using the partial order. This informs us if particular datasets 
are well behaved where we do not have to worry about which 
measure to choose, or we have to be careful about the exact 
choice of measure we make. 

Keywords 

Social Networks, Tie Strength, Axiomatic Approach 

1. INTRODUCTION 

Explicitly declared friendship links suffer from a low signal 
to noise ratio (e.g. Facebook friends or Linkedin contacts). 
Links are added for a variety of reasons like reciprocation, 
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peer-pressure, etc. Detecting which of these links are impor- 
tant is a challenge. 

Social structures are implied by various interactions between 
users of a network. We look at event information, where 
users participate in mutual events. Our goal is to infer the 
strength of ties between various users given this event infor- 
mation. Hence, these social networks are implicit. 

There has been a surge of interest in implicit social net- 
works. We can see anecdotal evidence for this in startups 
like COLOR (http://www.color.com) and new features in 
products like Gmail. COLOR builds an implicit social net- 
work based on people's proximity information while taking 
photos.^ Gmail's don't forget bob Roth et al. [2010] feature 
uses an implicit social network to suggest new people to add 
to an email given a existing list. 

People attend different events with each other. In fact, an 
event is defined by the set of people that attend it. An 
event can represent the set of people who took a photo at 
the same place and time, like COLOR, or a set of people 
who are on an email, like in Gmail. Given the set of events, 
we would like to infer how connected two people are, i.e. we 
would like to measure the strength of the tie between people. 
All that is known about each event is the list of people who 
attended it. People attend events based on an implicit social 
network with ties between pairs of people. We want to solve 
the inference problem of finding this weighted social network 
that gives rise to the set of events. 

Given a bipartite graph, with people as one set of vertices 
and events as the other set, we want to infer the tie- strength 
between the set of people. Hence, in our problem, we do 
not even have access to any directly declared social network 
between people, in fact, the social network is implicit. We 
want to infer the network based on the set of people who 
interact together at different points in time. 

We start with a set of axioms and find a characterization 
of functions that could serve as a measure of tie strength, 
just given the event information. We do not end up with a 
single function that works best under all circumstances, and 
in fact we show that there are non-obvious decisions that 
need to be made to settle down on a single measure of tie 
strength. 
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Moreover, we examine the case where the absolute value of 
the tie strength is not important, just the order is important 
(see Section 4.2.1). We show that in this case the axioms are 
equivalent to a natural partial order on the strength of ties. 
We also show that choosing a particular tie strength function 
is equivalent to choosing a particular linear extension of this 
partial order. 

Our contributions are: 

• We present an axiomatic approach to the problem of 
inferring implicit social networks by measuring tie strength. 

• We characterize functions that satisfy all the axioms 
and show a range of measures that satisfy this charac- 
terization. 

• We show that in ranking applications, the axioms are 
equivalent to a natural partial order; we demonstrate 
that to settle on a particular measure, we must make 
non-obvious decisions about extending this partial or- 
der to a total order which is best left to the particular 
application. 

• We classify measures found in prior literature accord- 
ing to the axioms that they satisfy. 

• In our experiments, we show that by using Kendall's 
Tau divergence, we can judge whether a dataset is well- 
behaved, where we do not have to worry about which 
tie-strength measure to choose, or we have to be careful 
about the exact choice of measure. 

The remainder of this paper is structured as follows. Sec- 
tion 2 outlines the related work. Section 3 presents our 
proposed model. Sections 4 and 5 describe the axioms and 
measures of tie strength, respectively. Section 6 presents our 
experiments. Section 7 concludes the paper. 

2. RELATED WORK 

[Granovetter, 1973] introduced the notion of strength of ties 
in social networks and since then has affected different areas 
of study. We split the related works into different subsec- 
tions that emphasize particular methods/applications. 

Strength of Ties: [Granovetter, 1973] showed that weak 
ties are important for various aspects like spread of infor- 
mation in social networks. There have been various studies 
on identifying the strength of ties given different features of 
a graph. [Gilbert and Karahalios, 2009] model tie strength 
as a linear combination of node attributes like intensity, in- 
timacy, etc to classify ties in a social network as strong or 
weak. The weights on each attribute enable them to find 
attributes that are most useful in making these predictions. 
[Kahanda and Neville, 2009] take a supervised learning ap- 
proach to the problem by constructing a predictor that de- 
termines whether a link in a social network is a strong tie 
or a weak tie. They report that network transactional fea- 
tures, which combine network structure with transactional 
features like the number of wall posting, photos, etc like 
sijpolti^('j fc)| ' ^he best predictors. 

Link Prediction: [Adamic and Adar, 2003] considers the 
problem of predicting links between web-pages of individu- 
als, using information such as membership of mailing lists 



and use of common phrases on web pages. They define a 
measure of similarity between users by creating a bipartite 
graph of users on the left and features (e.g., phrases and 
mailing-lists) on the right as w{u, v) = neighbor of u&.v) 
[Liben-Nowell and Kleinberg, 2003] formalizes the problem 
of predicting which new interactions will occur in a social 
network given a snapshot of the current state of the net- 
work. It uses many existing predictors of similarity between 
nodes like [Adamic and Adar, 2003, Jeh and Widom, 2002, 
Katz, 1953] and generates a ranking of pairs of nodes that 
are currently not connected by an edge. It compares across 
different dataset s to measure the efficacy of these measures. 
Its main finding is that there is enough information in the 
network structure that all the predictors handily beat the 
random predictor, but not enough that the absolute num- 
ber of predictions is high. [Allah, Magnien, and Latapy, 
2011] addresses the problem of predicting links in a bipar- 
tite network. They define internal links as links between left 
nodes that have a right node in common, i.e. they are at a 
distance two from each other and the predictions that are 
offered are only for internal links. 

Email networks: Because of the ubiquitous nature of 
email, there has been a lot of work on various aspects of 
email networks. [Roth, Ben-David, Deutscher, Flysher, Horn, 
Leichtberg, Leiser, Matias, and Merom, 2010] discusses a 
way to suggest more recipients for an email given the sender 
and the current set of recipients. This feature has been inte- 
grated in the Google's popular Gmail service. [Kahanda and 
Neville, 2009] constructs a regression model for classifying 
edges in a social network as strong or weak. They achieve 
high accuracy and find that network- trans actional features 
like number of posts from u to v normalized by the total 
number of posts by u achieve the largest gain in accuracy of 
prediction. 

Axiomatic approach to Similarity: [Altman and Ten- 
nenholtz, 2005] were one of the first to axiomatize graph 
measures. In particular, they studied axiomatizing PageR- 
ank. The closest in spirit to our work is the work by Lin [Lin, 
1998] that defines an information theoretic measure of simi- 
larity. This measure depends on the existence of a probabil- 
ity distribution on the features that define objects. While 
the measure of tie strength between people is similar to a 
measure of similarity, there are important differences. We 
do not have any probability distribution over events, just a 
log of the ones that occurred. More importantly, [Lin, 1998] 
defines items by the attributes or features they have. Hence, 
items with the same features are identical. In our case, even 
if two people attend all the same events, they are not the 
same person, and in fact they might not even have very high 
tie strength depending on how large the events were. 

3. MODEL 

We model people and events as nodes and use a bipartite 
graph C = (L U E) where the edges represent member- 
ship. The left vertices correspond to people while the right 
vertices correspond to events. We ignore any information 
other than the set of people who attended the events, like 
the timing, location, importance of events. These are fea- 
tures that would be important to the overall goal of measur- 
ing tie strength between users, but in this work we focus on 
the task of inferring tie strength using the graph structure 



only. We shall denote users in L by small letters {u^v^ . . .) 
and events in R by capital letters(P, Q, . . .). There is an 
edge between u and P if and only if u attended event P. 
Hence, our problem is to find a function on bipartite graphs 
that models tie strength between people, given this bipartite 
graph representation of events. 

Input Output 

Person x Event Bipartite Graph Partial order of Tie Strength 




Figure 1: Given a bipartite person x event graph, 
we want to infer the induced partial order of tie 
strength among the people. 

We also introduce some notation. We shall denote the tie 
strength of u and v due to a graph G as TSg{u,v) or as 
TS{u, v) if G is obvious from context. We shall also use 
TS^Ei,...,Ek}{u,v) to denote the tie strength between u and 
V in the graph induced by events {Ei, . . . , Ek} and users 
that attend at least one of these events. For a single event 
E, then TSe{u, v) denotes the tie strength between u and v 
if E where the only event. 

We denote the set of natural numbers by N. A sequence 
of k natural numbers is given by (ai, . . . , afc) and the set of 
all such sequences is N^. The set of all finite sequence of 
natural numbers is represented as N* = UfcN'^ 

4. AXIOMS OF TIE STRENGTH 

We now discuss the axioms that measures of tie strength 
between two users u and v must follow. 

Axiom 1 (Isomorphism) Suppose we have two graphs G 
and H and a mapping of vertices such that G and H 
are isomorphic. Let vertex u of G map to vertex a of 
H and vertex v to b. Then TSg{u,v) = TSnicL^b). 
Hence, the tie strength between u and v does not de- 
pend on the labels of u and v, only on the link struc- 
ture. 

Axiom 2 (Baseline) If there are no events, then the tie 
strength between each pair u and v is 0. TS(f){u,v) = 
0. If there are only two people u and v and a single 
party which they attend, then their tie strength is 1. 
TSiu,v}{u,v) = 1. 

Axiom 3 (Frequency: More events create stronger ties) 

All other things being equal, the more events common 
to u and v, the stronger the tie strength of u and v. 
Given a graph G = {LU R,E) and two vertices u,v ^ 
L. Consider the graph G' = (L U (i? U P), ^ U Pu,^;,...), 



where Pu,v,... is a new event which both u and v attend. 
Then the TSq' {u,v) > TSg {u,v). 

Axiom 4 (Intimacy: Smaller events create stronger ties) 

All other things being equal, the fewer invitees there 
are to any particular party attended by u and the 
stronger the tie strength between u and v. 
Given a graph C = {L U R, E) such that P ^ R and 
(P^u), (P^v), (Pjw) G E for some vertex w. Consider 
the graph G' = {L U R), E — {P,w)), where the edge 
{P,w) is deleted. Then the TSg{u,v) > TSg'{u,v). 

Axiom 5 (Larger events create more ties) Consider two 
events P and Q. If the number of people attending P 
is larger than the number of people attending Q, then 
the total tie strength created by event P is more than 
that created by event Q. 

\P\ > \Q\ =^ Hu.vepTSpM > J:^^,^qTSq{u,v). 

Axiom 6 (Conditional Independence of Vertices) The 

tie strength of a vertex u to other vertices does not de- 
pend on events that u does not attend; it only depends 
on events that u attends. 

Axiom 7 (Conditional Independence of Events) The 

increase in tie strength between u and v due to an event 
P does not depend other events, just on the existing 
tie strength between u and v. 

TSg+p{u^v) — g(TSG{u^v)^TSp{u^v)) for some fixed 
function monotonically increasing function g. 

Axiom 8 (Submodularity) The marginal increase in tie 
strength of u and v due to an event Q is at most the 
tie strength between u and v liQ was their only event. 
If C is a graph and Q is a single event, TSg{u,v) + 
TSq{u,v)>TSg+q{u,v). 

Discussion 

These axioms give a measure of tie strength between nodes 
that is positive but unbounded. Nodes that have a higher 
value are closer to each other than nodes that have lower 
value. 

We get a sense of the axioms by applying them to Figure 1. 
Axiom 1 (Isomorphism) implies that TS{b, c) = TS{b, d) and 
TS{c,e) = TS{d,e). Axiom 2 (Baseline), Axiom 6 (Condi- 
tional Independence of Vertices) and Axiom 7 (Conditional 
Independence of Events) imply that TS{a, c) = T5'(a, d) — 
TS{a, e) = TS{b, e) = 0. Axiom 4 (Intimacy: Smaller events 
create stronger ties) implies that TS{a,b) > TS(d,e). Ax- 
iom 3 (Frequency: More events create stronger ties) implies 
that TS{c,d) > TS{d,e). 

While each of the axioms above are fairly intuitive, they are 
hardly trivial. In fact, we shall see that various measures 
used in prior literature break some of these axioms. On the 
other hand, it might seem that satisfying all the axioms is a 
fairly strict condition. However, we shall see that even sat- 
isfying all the axioms are not sufficient to uniquely identify 
a measure of tie strength. The axioms leave considerable 
space for different measures of tie strength. 

One reason the axioms do not define a particular function 
is that there is inherent tension between Axiom 4 (Inti- 
macy: Smaller events create stronger ties) and Axiom 3 



(Frequency: More events create stronger ties). While both 
state ways in which tie strength becomes stronger, the ax- 
ioms do not resolve which one dominates the other or how 
they interact with each other. This is a non-obvious decision 
that we feel is best left to the application in question. In 
Figure 1, we cannot tell using just Axioms (1-8) which of 
TS{a, b) and TS{c, d) is larger. We discuss this more more 
in Section 4.2. 

4.1 Characterizing Tie Strength 

In this section, we shall state and prove Theorem 6 that 
gives a characterization of all functions that satisfy the ax- 
ioms of tie strength. Axioms (1-8) do not uniquely define 
a function, and in fact, one of the reasons that tie strength 
is not uniquely defined up to the given axioms is that we 
do not have any notion for comparing the relative impor- 
tance of number of events (frequency) versus the exclusivity 
of events (intimacy). For example, in terms of the partial 
order, it is not clear whether u and v having in common 
two events with two people attending them is better than 
or worse than u and v having three events in common with 
three people attending them. 

We shall use the following definition for deciding how much 
total tie strength a single event generates, given the size of 
the event. 

Notation 1. // there is a single event, with k people, we 
shall denote the total tie-strength generated as f{k). 

Lemma 2 (Local Neighborhood). The tie strength of u and 
V is affected only by events that both u and v attend. 



is equal to /(/c), and there are (2) edges, the tie-strength of 
each edge is equal to tw"- D 

Lemma 5. The total tie strength created at an event E 
with k people is a monotone function f{k) that is bounded 

by 1 < /(fc) < il) 

Proof. By Axiom 4 (Intimacy: Smaller events create stronger 
ties) , the tie strength of u and v due to E is less than that if 
they were alone at the event. TSe{u,v) < TSu,v{u,v) = 1, 
by the Baseline axiom. Summing up over all ties gives 
us that yTSsiu^v) < (2). Also, since larger events 
generate more ties, f{k) > f{i) : \/i < k. In particular, 
f{k) > /(I) = 1. This proves the result. □ 



We are now ready to state the main theorem in this section. 

Theorem 6. Given a graph G = (LUR^E) and two vertices 
u,v, if the tie-strength function TS follows Axioms (1-8), 
then the function has to be of the form 

TSg{u,v) = g{h{\Pi\), /i(|P2|), . . . , h{\Pk\)) 

where {Pi}i<i<k are the events common to both u and v, 

h : N ^ M is a monotonically decreasing function bounded 

by 1 > h{n) > and g : N* ^ R is a monotonically 
[2) 

increasing submodular function. 



Proof. Given a graph G and users u and v in C, G~'^ is 
obtained by deleting all events that u is not a part of. Sim- 
ilarly, G~^'^ is obtained by deleting all events of G~^ that 
V is not a part of. By Axiom 6 (Conditional Independence 
of Vertices), tie strength of u only depends on events that 
u attends. Hence, TSg{u,v) = TSq-u{u,v). Also, tie 
strength of v only depends on events that v attends. Hence, 
TSg{u,v) = TSg-u{u,v) = TSg-u,v{u,v). This proves our 
claim. □ 

Lemma 3. The tie strength between any two people is al- 
ways non-negative and is equal to zero if they have never 
attended an event together. 



Proof. Given two users u and v we use Axioms (1-8) to 
successively change the form of TSg{u,v). Let {Pi}i<i<k 
be all the events common to u and v. Axiom 7 (Condi- 
tional Independence of Events) implies that TSg{u,v) — 
g{TSp-{u,v))i<i<k, where ^ is a monotonically increasing 
submodular function. Given an event P, TSp{u, v) — h(\P\) — 
^^^^ • By Axiom 4 (Intimacy: Smaller events create stronger 

ties) , /i is a monotonically decreasing function. Also, by 

Lemma 5, / is bounded by 1 < f{n) < (2)- Hence, h it 

bounded by 1 > /i(n) > j^. This completes the proof of 
[2) 

the theorem. □ 



Proof. If two people have never attended an event together, 
then from Lemma 2 the tie strength remains unchanged if 
we delete all the events not containing either which in this 
case is all the events. Then Axiom 2 (Baseline) tells us that 
TS{u,v) = 0. 

Also, Axiom 3 (Frequency: More events create stronger ties) 
implies that TSg(u,v) > TS^{u^v) — 0. Hence, the tie 
strength is always non- negative. □ 

Lemma 4. // there is a single party, with k people, the Tie 
Strength of each tie is equal to ■ 

\2) 



Theorem 6 gives us a way to explore the space of valid func- 
tions for representing tie strength and find which work given 
particular applications. In Section 5 we shall look at pop- 
ular measure of tie strength and show that most of them 
follow Axioms (1-8) and hence are of the form described by 
Theorem 6. We also describe the functions h and g that 
characterize these common measures of tie strength . While 
Theorem 6 gives a characterization of functions suitable for 
describing tie strength, they leave open a wide variety of 
functions. In particular, it does not give the comfort of 
having a single function that we could use. We discuss the 
reasons for this and what we would need to do to settle upon 
a particular function in the next section. 



Proof. By Axiom 1 (Isomorphism), it follows that the tie- 
strength on each tie is the same. Since the sum of all the ties 



4.2 Tie Strength and Orderings 

We begin this section with a definition of order in a set. 



Definition 7 (Total Order). Given a set S and a binary 
relation <o on S, O = {S^<o) is called a total order if 
and only if it satisfies the following properties (i Total), for 
every u,v ^ S, u <o v or v <o u (ii Anti- Symmetric), 
u <o V and v <o u =^ u = v (Hi Transitive) . u <o v 
and V <o w =^ u <o w 

A total order is also called a linear order. 

Consider a measure TS that assigns a measure of tie strength 
to each pair of nodes v given the events that ah nodes 
attend in the form of a graph G. Since TS assigns a real 
number to each edge and the set of reals is totally ordered, 
TS gives a total order on all the edges. In fact, the function 
TS actually gives a total ordering of N*. In particular, if we 
fix a vertex then TS induces a total order on the set of 
neighbors of given by the increasing values of TS on the 
corresponding edges. 

4.2.1 The Partial Order on N* 

Definition 8 (Partial Order). Given a set S and a binary 
relation <r on S, V = {S, <v) is called a partial order if and 
only if it satisfies the following properties (i Reflexive) . for 
every u ^ S,u <-p u (ii Anti- Symmetric) . u <-p v and v <-p 
u =^ u — V (Hi Transitive) . u <j> v and v <j> w =^ 
u <-p w 

The set S is called a partially ordered set or a poset. 

Note the difference from a total order is that in a partial 
order not every pair of elements is comparable. We shall 
now look at a natural partial order A/" = (N*, <u) on the 
set N* of all finite sequences of natural numbers. Recall that 
N* = UfeN^ We shah think of this sequence as the number 
of common events that a pair of users attend. 

Definition 9 (Partial order on N*). Let a, 6 G N* where 
a — {ai)i<i<A and b = {bi)i<i<B- We say that a >^f b if 
and only if A > B and ai < bi : 1 < i < B . This gives the 
partial order = (N*, <Af). 

The partial order A/" corresponds to the intuition that more 
events and smaller events create stronger ties. In fact, we 
claim that this is exactly the partial order implied by the 
Axioms (1-8). Theorem 11 formalizes this intuition along 
with giving the proof. What we would really like is a total 
ordering. Can we go from the partial ordering given by the 
Axioms (1-8) to a total order on N*? Theorem 11 also 
suggest ways in which we can do this. 

4.2.2 Partial Orderings and Linear Extensions 

In this section, we connect the definitions of partial order 
and the functions of tie strength that we are studying. First 
we start with a definition. 

Definition 10 (Linear Extension). C — (S, <c) is called the 
linear extension of a given partial order V = {S,<v) if and 
only if C is a total order and C is consistent with the ordering 
defined by V , that is, for all u,v ^ S, u <r v =^ u <c v. 

We are now ready to state the main theorem which char- 
acterizes functions that satisfy Axioms (1-8) in terms of a 



partial ordering on N* . Fix nodes u and v and let Pi Pn 
be all the events that both u and v attend. Consider the 
sequence of numbers {\Pi\)i<i<k that give the number of 
people in each of these events. Without loss of general- 
ity assume that these are sorted in ascending order. Hence 
\Pi\ ^ We associate this sorted sequence of numbers 

with the tie {u,v). The partial order J\f induces a partial 
order on the set of pairs via this mapping. We also call 
this partial order J\f. Fixing any particular measure of tie 
strength, gives a mapping of N* to R and hence implies fix- 
ing a particular linear extension of A/", and fixing a linear 
extension of Af involves making non-obvious decisions be- 
tween elements of the partial order. We formalize this in 
the next theorem. 

Theorem 11. Let G — [LVJ R^E) be a bipartite graph of 
users and events. Given two users {u,v) G {L x L), let 
(|Pi|)i<i<fc ^ R be the set of events common to users {u,v). 
Through this association, the partial order N" = (N*, <u) on 
finite sequences of numbers induces a partial order on L x L 
which we also call J\f . 

Let TS be a function that satisfies Axioms (1-8). Then TS 
induces a total order on the edges that is a linear extension 
of the partial order J\f on L x L. 

Gonversely, for every linear extension C of the partial order 
J\f , we can find a function TS that induces C on L x L and 
that satisfies Axioms (1-8). 

Proof. TS : L X L ^ M. Hence, it gives a total order on the 
set of pairs of user. We want to show that if TS satisfies 
Axioms (1-8), then the total order is a linear extension of 
Af. The characterization in Theorem 6 states that given a 
pair of vertices {u,v) G (L x L), TS{u,v) is characterized 
by the number of users in events common to u and v and 
can be expressed as TSg{u,v) = g{h{\Pi\))i<i<k where g 
is a monotone submodular function and /i is a monotone 
decreasing function. Since TS : L x L ^ R, it induces a 
total order on all pairs of users. We now show that this is 
a consistent with the partial order J\f. Consider two pairs 
(ui^vi), {u2,V2) with party profiles a = (ai, . . . , oa) and b = 
{bi,...,bB). 

Suppose a >^f b. We want to show that TS{ui,vi) > 
TS{u2,V2). a >j\f b implies that A > B and that ai < 
bi -.yi <i < B. 

TS(ui,vi) 

= g{h{ai), . . . , h[aA)) 

> g(h(ai), . . . , h^as)) (Since g is monotone and A> B) 

> g{h(bi), . . . , h{bB)) (Since g is monotone and 

/^(a^) > h{bi) since ai < bi) 
— TS{u2, V2) 

This proves the first part of the theorem. 

For the converse, we are given an total ordering C = (N*, <c 
) that is an extension of the partial order A/". We want to 
prove that there exists a tie strength function TS : L x L ^ 
R that satisfies Axioms (1-6) and that induces C on L x L. 
We shall prove this by constructing such a function. We 



shall define a function / : N* ^ Q and define TSg{u,v) = 
/(ai, . . . , ttfe), where ai = \Pi\, the number of users that 
attend event in G. 

Define f{n) = ^ and /((/)) = 0. Hence, TSci,{u,v) = 
= and TS^u,v}{u,v) = /(2) = ^ = 1. This 
shows that TS* satisfies Axiom 2 (Baseline). Also, define 
/(1, 1, . . . , 1) = n. Since N* is countable, consider elements 

n 

in some order. If for the current element a under considera- 
tion, there exists an element b such that a b and we have 
already defined TS{b), then define TS{a) = TS{b). Else, 
find let agib — argmaxe {TS(e) is defined and a >^f e} and 
let aiub = argmiUg {TS{e) is defined and a <j\f e}. Since, 
at every point the sets over which we take the maximum of 
minimum are finite, both agw and aiub are well defined and 
exist. Define TS{a) = | {TS{agw) + TS{aiub)). □ 

In this abstract framework, an intuitively appealing linear 
extension is the random linear extension of the partial order 
under consideration. There are polynomial time algorithms 
to calculate this [Karzanov and Khachiyan, 1991]. We leave 
the analysis of the analytical properties and its viability as 
a strength function in real world applications as an open 
research question. 

In the next section, we turn our attention to actual measures 
of tie strength. We see some popular measures that have 
been proposed before as well as some new ones. 



Linear. Tie strength increases with number of events. 
TSiu,v)= J2 

Preferential attachment. 

TSiu,v) = \Tiu)\-\Tiv)\ 

Katz Measure. This was introduced in [Katz, 1953]. It 
counts the number of paths between u and v, where 
each path is discounted exponentially by the length of 
path. 

TS(u,v)= 

gG path between u,v 

Random Walk with Restarts. This gives a non-symmetric 
measure of tie strength. For a node u, we jump with 
probability a to node u and with probability 1 — a to a 
neighbor of the current node, a is the restart probabil- 
ity. The tie strength between u and v is the stationary 
probability that we end at node v under this process. 

Simrank. This captures the similarity between two nodes u 
and V by recursively computing the similarity of their 
neighbors. 

j 1 if u = V 

(' \r(u)\-\T(v)\ ouieiwibe 



5. MEASURES OF TIE STRENGTH 

There have been plenty of tie-strength measures discussed 
in previous literature. We review the most popular of them 
here and classify them according to the axioms they satisfy. 
In this section, for an event P, we denote by |P| the number 
of people in the event P. The size of P's neighborhood is 
represented by |r(P)|. 



Common Neighbors. This is the simplest measure of tie 
strength, given by the total number of common events 
that both u and v attended. 

TS(u,v) = \r{u)nr(v)\ 

Jaccard Index. A more refined measure of tie strength is 
given by the Jaccard Index, which normalizes for how 
"social" u and v are 



TS{u,v) 



\r{u)nr{v) 
\r{u)ur{v) 



Delta. Tie strength increases with the number of events. 

1 



TS{u,v) 



Per 



Yl (\p\\ 
(u)nr(v) \ 2 ) 



Adamic and Adar. This measure was introduced in [Adamic 
and Adar, 2003]. 



TS{u,v) 



1 



log |P| 



Now, we shall introduce three new measures of tie strength. 
In a sense, g — dX one extreme of the range of functions 
allowed by Theorem 6 and that is the default function used. 
g — max is at the other extreme of the range of functions. 



Max. Tie strength does not increases with number of events 

1 



TS(u, v) — max , 

Per(ii)nr(^) P 



Proportional. Tie strength increases with number of events. 
People spend time proportional to their TS in a party. 
TS is the fixed point of this set of equations: 



TS{u,v)= J2 ■ 

Per(u)nr(v) 



+ (1 



TS{u,v) 



Temporal Proportional. This is similar to Proportional, 
but with a temporal aspect. TS is not a fixed point, 
but starts with a default value and is changed accord- 
ing to the following equation, where the events are 
ordered by time. 



TS{u,v,t) 

\TS{u,v,t- 



1) ii u and v do not attend Pt 

\ TS(u,v,t-l) ,1 

6}^ VcT -r-r\ otherwise 



^ ^ Axioms 

^ — 

.\ I(^asur(\^ of Tie Strength^ ^^^^ 


Axiom 1 (Isomorphism) 


Axiom 2 (Baseline) 


Axiom 3 (Frequency: More events create stronger ties) 


Axiom 4 (Intimacy: Smaller events create stronger ties) 


Axiom 5 (Larger events create more ties) 


Axiom 6 (Conditional Independence of Vertices) 


Axiom 7 (Conditional Independence of Events) 


Axiom 8 (Submodularity) 


g{ai, ...,ak) and /^(|P^|) = 
(From the characterization in 
Theorem 6) 


V^UiiliilUii IN clgjliUUI b. 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


yyai, . . . ,ak) — Z^i-i 
h{n) = 1 


Jaccard Index. 


/ 


/ 


/ 


/ 


/ 


X 


X 


X 


X 


Delta. 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


o(oa O.h^ — n- 
yyu^ij • • • 7 J / jj — i 

hdl) -TTTT 


Adamic and Adar. 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


— ^ 

g{ai, ...,ak) = X)i=i 

^(.^^ log n 


Katz Measure. 


/ 


X 


/ 


/ 


/ 


/ 


X 


X 


X 


Preferential attachment. 


/ 


/ 


X 


/ 


/ 


/ 


X 


X 


X 


Random Walk with Restarts. 


/ 


X 


X 


X 


/ 


/ 


X 


X 


X 


Simrank. 


/ 


X 


X 


X 


X 


X 


X 


X 


X 


Max. 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ \ k 

g{ai, ...^ak) = max^^i 
h{n) = h 


Linear. 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


/ 


g{ai,. ..,ak) = Y.i~i o-i 
h{n) = I 


Proportional. 


/ 


X 


X 


/ 


X 


/ 


X 


X 


X 



Table 1: Measures of tie strength and the axioms they satisfy 



Table 1 provides a classification of all these tie-strength mea- 
sures, according to which axioms they satisfy. If they satisfy 
all the axioms, then we use Theorem 6 to find the character- 
izing functions g and h. An interesting observation is that all 
the "self-referential" measures (such as Katz Measure, Ran- 
dom Walk with Restart, Simrank, and Proportional) fail to 
satisfy the axioms. Another interesting observation is in the 
classification of measures that satisfy the axioms. The ma- 
jority use = X) to aggregate tie strength across events. 
Per event, the majority compute tie strength as one over a 
simple function of the size of the party. 



6.1 Data Sets 

Shakespearean Plays. We take three well-known plays by 
Shakespeare (Macbeth, Tempest, and A Comedy of Errors) 
and create bipartite person x event graphs. The person- nodes 
are the characters in the play. Each event is a set of charac- 
ters who are on the stage at the same time. We calculate the 
strength of ties between each pair of nodes. Thus without 
using any semantic information and even without analyz- 
ing any dialogue, we estimate how much characters interact 
with one another. 



6. EXPERIMENTS 

This section presents our findings on five data sets: Shake- 
spearean plays (Macbeth, Tempest, and A Comedy of Er- 
rors), Reality Mining, and Enron Emails. 



The Reality Mining Project. This is the popular dataset 
from the Reality Mining project at MIT [Eagle, Pent land, 
and Lazer, 2009] . This study gave one hundred smart phones 
to participants and logged information generated by these 




Macbeth 



Tempest 



A Comedy of Errors 



Figure 2: Inferred weighted social networks between characters in Shakespearean plays. The thicker an edge, 
the stronger the tie. Tie Strength was calculated using the tie-strength measure Linear. 



smart phones for several months. We use the bluetooth prox- 
imity data generated as part of this project. The bluetooth 
radio was switched on every five minutes and logged other 
bluetooth devices in close proximity. The people are the 
participants in the study and events record the proximity 
between people. This gives us a total of 326,248 events. 



Enron Emails. This dataset consists of emails from 150 
users from the Enron corporation, that were made public 
during the Federal Energy Regulatory Commission investi- 
gation. We look at all emails that occur between Enron 
addresses. Each email is an event and all the people copied 
on that email i.e. the sender (from), the receivers (to, cc 
and bcc) are included in that event. This gives a total of 
32,471 people and 371,321 events. 

6.2 Measuring Coverage of the Axioms 

In Section 4, we discussed axioms governing tie-strength and 
characterized the axioms in terms of a partial order in The- 
orem 11. We shall now look at an experiment to determine 
the "coverage" of the axioms, in terms of the number of pairs 
of ties that are actually ordered by the partial order. 

For different datasets, we use Theorem 11 to generate a par- 
tial order between all ties. Table 2 shows the percentage of 
all ties that are not resolved by the partial order - i.e., the 
partial order cannot tells us if one tie is greater or if they 
are equal. Each measure of tie-strength gives a total order 
on the ties; and, hence resolves all the comparisons between 
pairs of ties. The number of tie-pairs which are left incom- 
parable in the partial order gives a notion of the how much 
room the axioms leave open for different tie-strength func- 
tions to differ from each other. Table 2 shows that partial 
order does resolve a very high percentage of the ties. Also, 
we see that real- world datasets (e.g.. Reality Mining) have 
more unresolved ties than the cleaner Shakespearean plays 
datasets. 

Next, we look at two tie-strength functions (Jaccard Index 
and Temporal Proportional) which do not obey the axioms. 
As previously shown. Theorem 11 implies that these func- 
tions do not obey the partial order. So, there are some 
tie-pairs in conflict with the partial order. Table 3 shows 



Dataset 


Tie Pairs 


Incomparable Pairs (%) 


Tempest 


14,535 


275 (1.89) 


Comedy of Errors 


14,535 


726 (4.99) 


Macbeth 


246,753 


584 (0.23) 


Reality Mining 


13,794,378 


1,764,546 (12.79) 



Table 2: Number of ties not resolved by the partial 
order. The last column shows the percentage of tie 
pairs on which different tie-strength functions can 
differ. 



the number of tie-pairs that are actually in conflict. This 
experiment gives us some intuition about how far away a 
measure is from the axioms. We see that for these datasets. 
Temporal Proportional agrees with the partial order more 
than the Jaccard Index. We can also see that as the size of 
the dataset increases, the percentage of conflicts decreases 
drastically. 



Dataset 


Tie Pairs 


Jaccard (%) 


Temporal(%) 


Tempest 


14,535 


488 (3.35) 


261 (1.79) 


Comedy 


14,535 


1,114 (7.76) 


381 (2.62) 


Macbeth 


246,753 


2,638 (1.06) 


978 (0.39) 


Reality 


13,794,378 


290,934 (0.02) 


112,546 (0.01) 



Table 3: Number of conflicts between the partial 
order and the tie-strength functions: Jaccard Index 
and Temporal Proportional. The second and third 
columns show the percentage of tie-pairs in conflict 
with the partial order. 

6.3 Visualizing Networks 

We obtain the tie strength between characters from Shake- 
spearean plays using the linear function proposed by Linear. 
Figure 2 shows the inferred weighted social networks. Note 
that the inference is only based on people occupying the 
same stage and not on any semantic analysis of the text. 
The inferred weights (i.e. tie strengths) are consistent with 
the stories. For example, the highest tie strengths are be- 
tween Macbeth and Lady Macbeth in the play Macbeth, be- 
tween Ariel and Prospero in Tempest, and between Dromio 
of Syracuse and Antipholus of Syracuse in A Comedy of 
Errors. 
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Figure 3: Frequency distribution of number of peo- 
ple per event for the Reality Mining and Enron 
datasets (in log-log scale) 
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Figure 4: Frequency distribution of number of peo- 
ple per event for the Shakespearean Plays 



6.4 Measuring Correlation among Tie-Strength 
Functions 

Figures 3 and 4 show the frequency distributions of the num- 
ber of people at an event. We see that these distributions 
are very different for the different graphs (even among the 
real- world communication networks, Enron and MIT Real- 
ity Mining). This suggests that different applications might 
need different measures of tie strength. 

Figure 4 shows Kendall's r coefficient for the Shakespearean 
plays, the Reality Mining data and Enron emails. Depend- 
ing on the data set, different measures of tie strength are cor- 
related. For instance, in the "clean" world of Shakespearean 
plays Common Neighbor is the least correlated measure; 
while in the "messy" real world data from Reality Mining 
and Enron emails. Max is the least correlated measure. 

7. CONCLUSIONS 

We presented an axiomatic approach to the problem of infer- 
ring implicit social networks by measuring tie strength from 
bipartite person x event graphs. We characterized functions 
that satisfy all axioms and demonstrated a range of mea- 
sures that satisfy this characterization. We showed that in 
ranking applications, the axioms are equivalent to a natural 



partial order; and demonstrated that to settle on a partic- 
ular measure, we must make a non-obvious decision about 
extending this partial order to a total order which is best left 
to the particular application. We classified measures found 
in prior literature according to the axioms that they satisfy. 
Finally, our experiments demonstrated the coverage of the 
axioms and revealed through the use of Kendall's Tau cor- 
relation whether a dataset is well-behaved, where we do not 
have to worry about which tie-strength measure to choose, 
or we have to be careful about the exact choice of measure. 
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Figure 5: Kendall's r coefficient for Shakespearean plays, the Reality Mining data and the Enron emails. 
The color scale goes from bright green (coefficient = 1) to bright red (coefficient = -1). In the Skakespearean 
plays, the least correlated measure is Common Neighbor (as indicated by the red cells in that column). In 
the real-world communication networks of Enron and Reality Mining, the least correlated measure is Max 
(again as indicated by the red cells in that column). Since the correlation matrices are symmetric, we show 
only the upper-triangle entries. 



